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Abstract 

We apply percolation theory to a recently proposed measure of fragmentation F for social net- 
works. The measure F is defined as the ratio between the number of pairs of nodes that are not 
connected in the fragmented network after removing a fraction q of nodes and the total number 
of pairs in the original fully connected network. We compare F with the traditional measure used 
in percolation theory, P^, the fraction of nodes in the largest cluster relative to the total number 
of nodes. Using both analytical and numerical methods from percolation, we study Erdos-Renyi 
(ER) and scale-free (SF) networks under various types of node removal strategies. The removal 
strategies are: random removal, high degree removal and high betweenness centrality removal. We 
find that for a network obtained after removal (all strategies) of a fraction q of nodes above per- 
colation threshold, P^ ~ (1 — F) 1 / 2 . For fixed P^ and close to percolation threshold (q = q c ), we 
show that 1 — F better reflects the actual fragmentation. Close to q c , for a given P^, 1 — F has 
a broad distribution and it is thus possible to improve the fragmentation of the network. We also 
study and compare the fragmentation measure F and the percolation measure P^ for a real social 
network of workplaces linked by the households of the employees and find similar results. 
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I. INTRODUCTION 



Many physical, sociological and biologica l sy stems are represented by complex net- 
works 1, 

fl 3 3, 3 3 0, 3 3 H H Q 0, HQ llTT l . One of the important 

problems in complex networks is the fragmentation of networks (1333 Q, Q, 3| • I n 
this problem one studies the statistical properties of the fragmented networks after removing 
nodes (or links) from the original fully connected network using a certain strategy. Many 
different removal strategies have been developed for various purposes, e.g., mimicking the 
real world network failures, improving the effectiveness of network disintegration, etc. Ex- 
amples include random removal (RR) strategy, the high degree removal (HDR) strategy and 



the high betweenness centrality removal strategy (HBR) |9|, |18|, |19|, |20|, |2l| . Note that the 
best strategy for fragmentation (minimum nodes removal) is also the best for immunization 
since it represent the minimum number of nodes or links needed to be immunized so that 
epidemic cannot spread in the network. 

Recently, a new measure of fragmentation has been developed in social network stud- 



ies |22j. Given a fully connected network of iV nodes which is fragmented into separate 
clusters j3] by removing m nodes following a certain strategy. We define q = m/N the con- 
centration of nodes removed and p = 1 — q the concentration of existing nodes. The degree 
of fragmentation F of the network is defined as the ratio between the number of pairs of 
nodes that are not connected in the fragmented network and the total number of pairs in 
the original fully connected network. Suppose that after removal there are n clusters in the 
fragmented network, since all members of a cluster are, by definition, mutually reachable, 
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the measure F can be written as follows 

Z^NJNj - 1) 

Fsl ~ i) « 

Here, Nj is the number of nodes in cluster j, n is number of clusters in the fragmented 
network, and N the number of nodes in the original fully connected network. For an un- 
damaged network, F — 0. For a totally fragmented network, F — 1. The quantity C defined 
in Eq. (JTJ can be regarded as the "connectivity" of the network. When C = 1 the network 
is fully connected while for C = it is fully fragmented. 

In this paper, we study the statistical behavior of F = 1 — C using both analytical and 
numerical methods and relate it to the traditional measure of fragmentation, the relative 
size of the largest cluster, P^, used in percolation theory. In this way, we are able to 



obtain analytical results for the fragmentation F of networks. We study three removal 
strategies: the random removal (RR) strategy which removes randomly selected nodes, the 
high degree removal (HDR) strategy which targets and removes nodes with highest degree 
and the high betweenness centrality removal (HBR) strategy which targets and removes nodes 
with highest betweenness centrality. The HDR (or HBR) strategies first removes the node 
with the highest degree (or the highest betweenness centrality), and then the second highest 
and so on. These three strategies are commonly used in models representing random and 
targeted attacks in real world networks 0, Q, S, 0] • 

II. THEORY 

Traditionally, in analogy to percolation, physicists describe the connectivity of a frag- 
mented network by the ratio P^ = N^/N (called the incipient order parameter) between 



the largest cluster size -N™ (called the infinite cluster) and N. Many properties have been 
derived for this measure p, 13, For example, in random networks, P^, undergoes a 
second order phase transition at a threshold p c . Below p c , P M is zero for N — > oo, while 
for p > PcJ'aci is finite. This occurs for both RR and HDR in random networks and lattice 
— BQBHQ. The thread p— defends 0n th e de,ee dl8tribution , 
the network topology, and the removal strategy j(l B, U Q> 25]. The specific way that P^ 
approaches zero at p c depends on the network topology and removal strategy but not on 
details such as p c . In scale free networks, where the degree distribution p(k) ~ k~ x and 
2 < A < 3, it has been found aat.^Ofo^™ while p c is very high for HDR 
strategy [tIB] and for HBR strategy j^J. For A > 3 and RR, p c is finite. 

Next, we show simulation results of removing nodes in all strategies (RR, HDR and 
HBR) on ER and scale free networks. Fig. ^ shows the behavior of C (= 1 — F) and P^ 
versus q for Erdos-Renyi (ER) and scale-free (SF) networks with RR (Fig. ^a),(b)), HDR 
(Fig. life), (d)) and HBR (Fig.[Tfe),(f)) strategies. As seen in Fig. Ufa), the network becomes 
more fragmented when q increases and both measures drop sharply at q c = 1 — p c . Note 
that C shows a transition similar to P^ at p — p c \ however, above q c , C becomes more flat 
in contrast to P^, indicating the effect of connectivity in the small clusters which do not 
effect Pqo. 

In contrast to Fig. da), the transition in Fig. ^b) is not as sharp and therefore C and 
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Poo do not show a collapse together. The reason is that for A = 2.5 there is no transition 

n n 

at q < 1 [6] and for A = 3.5, Poo falls much less sharply compared to ER [26]. For HDR 
shown in Figs.^c),(d), the transition is again sharp since after removing high degree nodes, 
the network becomes similar to ER networks, which do not have high degree nodes Q]. A 
similar behavior is seen for HBR shown in Figs. ^e),(f) due to the known high correlation 
between high degree nodes and high betweenness centrality nodes j^J. 

When p > p c and not too close to p c , following percolation theory, the infinite cluster 
dominates the system and ~ p, i.e. most of unremoved nodes are connected. Thus, we 
assume that the small clusters will have a small effect on C compared to the largest one. 
Using this assumption, Eq. (JTJ) can be written as 

r = i p — ^U N ^ ~ !) „ jU4 - 1) „ Nl ^ 

N(N-l) ~ N(N-l) ~ N 2 ~ 00 ' y) 

Therefore, we expect P M and C have the relationship P^ « C 1//2 when p > p c (but not 
too close to p c ). When p < p c , the infinite cluster loses its dominance in the system and 
Poo ~ \n(N)/N — > for large N ^]. Here significant variations between P^ and C 1 / 2 are 
expected, as indeed seen in Fig. 01 



III. SIMULATIONS 

We test by simulations the relationship C ~ P^ derived for p > p c in Eq. (0). In Fig.|2fa) 
we plot Poo vs C 1 / 2 for RR strategy in ER networks and for several values of p. As predicted 
by Eq. (J2J), the plot of P^ vs C 1 / 2 yields a linear relationship with slope equal to 1 when 
p > p c = = 1/3. The range of P^ and C 1 / 2 for p = 0.4 is due to the variation of 

Poo for a given p and the same variation appears for C 1//2 showing that the infinite cluster 
dominates and Eq. (j2J) is valid. However, when p drops close to p c = 1/3, the system 
approaches criticality and the one-to-one correspondence between C 1 ^ 2 and P^ is not as 
strong. This variation is attributed to the presence of clusters other than the infinite one, 
which influence C but not P^. 

Similar behavior is observed for RR strategy in SF networks with A = 3.5 shown in 
Fig. I2^b). For A = 3.5, the variation in C 1//2 emerge close to p c = 0.2. However, for A = 2.5, 
percolation theory suggests that p c approaches for large systems. As a result, no significant 
variation is observed even when Poo is as small as 5 • 10 -4 . This observation supports that 
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the SF networks with A < 3 are quite robust in sustaining its infinite cluster against random 
removal Figs. Efc), EfcOGD^e) andEff) show the results for HDR and HBR strategies in 
ER and SF networks. For these targeted strategies, the variation of C 1//2 and P M shows up 
at significantly higher p compared to the random case, indicating that the infinite cluster 
breaks down easier under HDR and HBR attacks for both ER and SF networks, as seen also 
in Fig. ^ At this point, the SF network with A = 2.5 becomes no longer as robust as in the 
random case, as can be clearly observed in the large variation at P^ ~ 0.05. 

To further investigate the characteristics of the variation of C for a given P^, we calculate 
the probability distributions p{C) versus C/C for a given P^ where C is the average value 
of C and the results are plotted in Fig. EH In this case, C*, the most probable value of 
C, is determined by the fixed infinite cluster size P^ with C* ~ P^, and the broadness of 
p(C) comes from presence of clusters other than the infinite one. Because the largest cluster 
size is fixed, the upper cutoff of p{C) emerges due to the limitation on the sizes of other 
clusters that by definition must be smaller than the largest cluster. For the RR strategy, 
the broadness of p{C) for ER network is bigger than that of SF networks at the same P^, 
especially for A = 2.5 where the system is always high above criticality and the variation is 
relatively small. On the contrary, for the HDR and HBR strategies, the broadness of p{C) 
for ER and SF networks are of the same order due to the fact that for HDR and HBR, p c 
is also finite for A = 2.5. This observation is consistent with the results shown in Fig. |21 

Now we focus on the dependence of p(C) on the system size N at p c (Fig. From 
percolation theory and for ER under RR strategy, the infinite cluster size Noo at criticality 
behaves as 0, Q] 

iVoo ~ N 2 '\ (3) 
Since C follows similar behavior as at criticality, we expect C for p = p c to behave as, 

C = 1 - F « (iVoo/iV) 2 ~ N~ 2/3 . (4) 
Thus, we expect the probability distribution p{C) with p = p c to scale as 

p(C) = N 2 l 3 g(CN 2 ' 3 ) (5) 

where g is a scaling function. 

Fig. lib supports this scaling relationship. We calculate p(C) for RR strategy at criticality 
on ER networks with N values of 50000, 100000, 200000 and (k) = 3 (shown in Fig. Hi), 
and 34] find a good collapse when plotted (Fig. HJd) using the scaling form of Eq. (JSJ). 



IV. REAL NETWORKS 



The ER networks and the SF networks that we have been studying are random ensemble 
of networks which are only determined by their degree distribution. It is known that many 
real networks often exhibit important structural properties relevant for percolation proper- 



ties such as hi 
not exhibit 113 



i level of clustering, assortativity and fractality that random networks do 
29^ . We therefore test our results about the relation between C and on 
an example of a large real social network. The network we use is extracted from a data set 
obtained from Statistics Sweden jjJcJ and consists of all geographical workplaces in Sweden 
that can be linked with each other by having at least one employee from each workplace 
sharing the same household. Household is defined as a married couple or a couple having 
kids together that are living in the same flat or house. Unmarried couples without kids and 
other individuals sharing household are not registered in the dataset as households. This 
kind of network have been shown to be of importance for the spreading of Influenza 31 1 
and are also likely to be important for spreading of information and rumors in society. The 
network consists of 310136 nodes (workplaces) and 906260 links (employees sharing the same 
households) and, as shown in Fig. E(a), is approximately a SF network with A ~ 2.6 and 
an exponential cut off. The network shows almost no degree-degree correlation (assorta- 
tivity) (Fig. Efb)). However, the workplace network clustering coefficient c is significantly 
higher than that of a random SF network with same A and N (Fig. 0c)). The average of c is 
0.048 for the workplace network versus 3.2 x 10~ 4 for the random SF networks, which is con- 

nn 

32j,|33j. Fig. 0d) shows the node distribution 



sistent with the earlier social network studies 



n(k s ) of k-shell (k s ) in the network compared to that of a random SF network with same A 
and (k) 34]. It is seen that in the workplace network there exist significantly more shells and 
the large shells are more occupied compared to random SF. The distribution n(k s ) shows 
a power-law behavior with slope —1.52. This indicates the structure of this real network. 
Fig. EJe) shows the crust total size, the largest cluster size and the second largest cluster size 
as a function of shell k s . It is seen that the largest cluster has two transitions. One around 
k s = 5 and the other at k s = 27. At k s > 5, the largest cluster increase from zero to a finite 
fraction of the network. This transition is related to the HDR seen in Fig.|^d) (see also j^). 
The second transition at k s = 27 defines the nucleus of the workplace network which include 
about 100 nodes (see Fig. Efd), n(28) ~ 100) which are well connected to each other. The 
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jump of the largest cluster from k s = 27 to k s = 28 from 2.8 x 10 5 nodes to 3.1 x 10 5 nodes 
(i.e. 3 x 10 4 nodes) is due to nodes which are connected only to the nucleus. These nodes 
are called dendrites. Fig. [H^e) is very similar to the Medusa model 3^ suggested for the AS 
topology of the Internet. Figs. EUa) andEfb) show simulation results for several values of p 
for Poo vs C 1 ' 2 . The curves are linear, similar to Fig. |2] for our model networks. Moreover, 
Figs. Uc) and (d) show that C 1 ^ 2 and Px, are almost identical above the criticality threshold 
p c for a typical configuration after both RR and HDR. For p below criticality, differences 
appear which are especially obvious for HDR strategy where q c = 1 — p c is relatively small. 
While Pqo rapidly decreases to a very small value (below 10~ 5 ), a plateau shows up in the 
curve of C 1 ^ 2 due to the influence of the small clusters. 



V. SUMMARY 



In summary, we study the measure for fragmentation F = 1 — C proposed in social 
sciences and relate it to the traditional P^ used in physics in percolation theory. For p 
above criticality, C and Pqo are highly correlated and C ~ P^. Close to criticality, for 
p > p c and below p c , variations between C and P^ emerge due to the presence of the small 
clusters. For systems close or below criticality, F gives better measure for fragmentation of 
the whole system compared to P^. We study the probability distribution p(C) for a given 
Poo and find that p(C) at p = p c obeys the scaling relationship p(C) = N 2 ^g{CN 2 ^) for 
both RR strategy on ER network, and for HDR on scale free networks. 

We thank ONR, European NEST project DYSONET, and Israel Science Foundation for 
financial support. 

The study was approved by the Regional Ethical Review board in Stockholm (record 
2004/2:9). 
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FIG. 1: The behavior of C and versus q on ER and SF networks. For ER networks, N = 200000 
and (k) = 3. For SF networks, N = 80000. The graphs are (a) RR strategy on ER networks, (b) 
RR strategy on SF networks, (c) HDR strategy on ER networks, (d) HDR strategy on SF networks, 
(e) HBR strategy on ER networks and (f) HBR strategy on SF networks. 
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FIG. 2: Relationship between C 1 / 2 and for ER and SF networks with system size N = 50000. 
For ER networks, the average degree (k) = 3, and for SF networks, A = 2.5 and 3.5. The graphs 
are (a) RR strategy on ER networks, (b) RR strategy on SF networks, (c) HDR strategy on ER 
networks, (d) HDR strategy on SF networks, (e) HBR strategy on ER networks and (f) HBR 
strategy on SF networks. 



11 





FIG. 3: Probability distributions p(C/C) versus C/C for several values of Poo and for ER networks 
with (k) = 3, N = 200000 and SF networks with N = 80000 and A = 2.5 and 3.5. (a) RR strategy 
on ER networks, (b) RR strategy on SF networks, (c) HDR strategy on ER networks and (d) HDR 
strategy on SF networks. 
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FIG. 4: The dependence of p(C) on the system size N with p = p c for (a) before scaling and (b) 
after scaling. Simulations are performed on ER networks with {k) = 3. 



12 



S -2 



3 10 h 




10 



10 



100 1000 10000 




— workplace k=10 
workplace k=100 
i f. V random ^.=2.6 k=10 

/ N\ - random X=2.6 k= 100 1 




1000 





FIG. 5: Properties of the Swedish network of workplaces, (a) The cumulative degree distribution 
(showing A = 2.6). (b) The distribution of k n , the degree of the neighbors of nodes having degree 
k. (c) The cumulative distribution of clustering coefficient c. (d) Number of nodes in shell k s . (e) 
Size of largest and second largest cluster in each k-crust. In (b), (c) and (d) the distributions of 
random SF networks with the same A and N are plotted for comparison. 
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FIG. 6: Poo vs C 1 / 2 for (a) RR strategy and (b) HDR strategy and plot C 1 / 2 , versus q for (c) 
RR strategy and (d) HDR strategy for the Swedish network of workplaces with iV = 310136 nodes. 
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